Best of LessWrong 2021

John made his own COVID-19 vaccine at home using open source instructions. Here's how he did it and why.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
* Psychotic “delusions” are more about holding certain genres of idea with a socially inappropriate amount of intensity and obsession than holding a false idea. Lots of non-psychotic people hold false beliefs (eg religious people). And, interestingly, it is absolutely possible to hold a true belief in a psychotic way. * I have observed people during psychotic episodes get obsessed with the idea that social media was sending them personalized messages (quite true; targeted ads are real) or the idea that the nurses on the psych ward were lying to them (they were). * Preoccupation with the revelation of secret knowledge, with one’s own importance, with mistrust of others’ motives, and with influencing others' thoughts or being influenced by other's thoughts, are classic psychotic themes. * And it can be a symptom of schizophrenia when someone’s mind gets disproportionately drawn to those themes. This is called being “paranoid” or “grandiose.” * But sometimes (and I suspect more often with more intelligent/self-aware people) the literal content of their paranoid or grandiose beliefs is true! * sometimes the truth really has been hidden! * sometimes people really are lying to you or trying to manipulate you! * sometimes you really are, in some ways, important! sometimes influential people really are paying attention to you! * of course people influence each others' thoughts -- not through telepathy but through communication! * a false psychotic-flavored thought is "they put a chip in my brain that controls my thoughts." a true psychotic-flavored thought is "Hollywood moviemakers are trying to promote progressive values in the public by implanting messages in their movies." * These thoughts can come from the same emotional drive, they are drawn from dwelling on the same theme of "anxiety that one's own thoughts are externally influenced", they are in a deep sense mere arbitrary verbal representations of a single mental phenomenon...
Mark XuΩ6714229
44
Alignment researchers should think hard about switching to working on AI Control I think Redwood Research’s recent work on AI control really “hits it out of the park”, and they have identified a tractable and neglected intervention that can make AI go a lot better. Obviously we should shift labor until the marginal unit of research in either area decreases P(doom) by the same amount. I think that implies lots of alignment researchers should shift to AI control type work, and would naively guess that the equilibrium is close to 50/50 across people who are reading this post. That means if you’re working on alignment and reading this, I think there’s probably a ~45% chance it would be better for your values if you instead were working on AI control! For this post, my definitions are roughly: * AI alignment is the task of ensuring the AIs “do what you want them to do” * AI control is the task of ensuring that if the AIs are not aligned (e.g. don’t always “do what you want” and potentially want to mess with you), then you are still OK and can use them for economically productive tasks (an important one of which is doing more alignment/control research.) Here are some thoughts, arguments, and analogies (epistemic status: there is no “hidden content”, if you don’t find the literal words I wrote persuasive you shouldn’t update. In particular, just update on the words and don't update about what my words imply about my beliefs.): * Everything is in degrees. We can “partially align” some AIs, and things will be better if we can use those AIs for productive tasks, like helping with alignment research. The thing that actually matters is “how aligned are the AIs” + “how aligned to they need to be to use them for stuff”, so we should also focus on the 2nd thing. * If you were a hedge fund, and your strategy for preventing people from stealing your data was and starting new hedge fund was “we will make the hedge fund a super fun place to work and interview people carefull
Of Greater Agents and Lesser Agents How do more sophisticated decision-makers differ from less sophisticated decision-makers in their behaviour and values? Smarter more sophisticated decisionmakers engage in more and more complex commitments — including meta-commitments not to commit. Consequently, the values and behaviour of these more sophisticated decisionmakers "Greater Agents" are systematically biased compared to less sophisticated decisionmakers "Lesser Agents". ******************************* Compared to Lesser Agents, the Greater Agents are more judgemental, (self-)righteous, punish naivité, are more long-term oriented, adaptive, malleable, self-modifying, legibly trustworthy and practice more virtue-signalling, strategic, engage in self-reflection & metacognition, engage in more thinking, less doing, symbolic reasoning, consistent & 'rational' in their preferences, they like money & currency more, sacred values less, value engagement in thinking over doing, engaged in more "global" conflicts [including multiverse-wide conflicts throguh acausal trade], less empirical, more rational, more universalistic in their morals, and more cosmopolitan in their esthetics, they are less likely to be threatened, and more willing to martyr themselves, they willing to admit their values' origins and willing to barter on their values, engage in less frequent but more lethal war, love formal protocols and practice cryptographic magika.  * Greater Agents punish Dumb Doves; Greater Agents are Judgemental.  * Higher-order cooperators will punish naive lower-order cooperaters for cooperating with bad actors. cf Statistical Physics of Human Cooperation * Greater Agents are more 'rational'; have more 'internal cooperation' and more self-Judgement. They love money and bartering.  * They are more coherent & consistent. They more Take Sure Gains and Avoid Sure Losses. See Crystal Healing, Why Not Subagents.  * They less adhere to Sacred values;  are more willing and a
Dan Braun5125
17
In which worlds would AI Control (or any other agenda which relies on non-trivial post-training operation) prevent significant harm? When I bring up the issue of AI model security to people working in AI safety, I’m often met with something of the form “yes, this is a problem. It’s important that people work hard on securing AI models. But it doesn’t really affect my work”. Using AI Control (an area which has recently excited many in the field) as an example, I lay out an argument for why it might not be as effective an agenda as one might think after considering the realities of our cyber security situation. 1. AI Control concerns itself with models that intentionally try to subvert its developers. 2. These models are likely to be very generally capable and capable of causing significant harm without countermeasures. 3. Leading cyber-capable institutions would likely expend significant resources and political capital to steal these models or steal enough insights to reproduce such models. 4. If the weights or insights are stolen, work on AI control will not prevent these models from causing significant harm. 5. Current AI developers are not on track to be able to defend against high-priority operations from leading cyber-capable institutions in the coming years. 6. Therefore,  AI control will only be useful in the coming years under one (or more) of these conditions: 1. Models that scheme are unlikely to be generally capable/dangerous enough to be a high-priority target for leading cyber-capable institutions. 2. Models that scheme are only developed by actors that can thwart high-priority operations from leading cyber-capable institutions (which precludes current AI developers for at least several years). 3. AI Control won’t be directly useful in the coming years but it will be indirectly useful to progress the field for when models are developed by actors capable of thwarting top cyber operations. 4. Even if the model was stolen and caused
Recursive self-improvement in AI probably comes before AGI. Evolution doesn't need to understand human minds to build them, and a parent doesn't need to be an AI researcher to make a child. The bitter lesson and the practice of recent years suggest that building increasingly capable AIs doesn't depend on understanding how they think. Thus the least capable AI that can build superintelligence without human input only needs to be a competent engineer that can scale and refine a sufficiently efficient AI design, in an empirically driven mundane way that doesn't depend on matching capabilities of Grothendieck for conceptual invention. This makes the threshold of AGI less relevant for timelines of recursive self-improvement than I previously expected. With o1 and what straightforwardly follows, we plausibly already have all it takes to get recursive self-improvement, if the current designs get there with the next few years of scaling, and the resulting AIs are merely competent engineers that fail to match humans at less legible technical skills.

Popular Comments

Recent Discussion

This is a good and important point. I don't have a strong opinion on whether you're right, but one counterpoint: AI companies are already well-incentivized to figure out how to control AI, because (as Wei Dai said) controllable AI is more economically useful. It makes more sense for nonprofits / independent researchers to do work that AI companies wouldn't do otherwise.

2Mark Xu
see my longer comment https://www.lesswrong.com/posts/A79wykDjr4pcYy9K7/mark-xu-s-shortform#8qjN3Mb8xmJxx59ZG
4Mark Xu
I think I disagree with your model of importance. If your goal is the make a sum of numbers small, then you want to focus your efforts where the derivative is lowest (highest? signs are hard), not where the absolute magnitude is highest. The "epsilon fallacy" can be committed in both directions: both in that any negative dervative is worth working on, and that any extremely large number is worth taking a chance to try to improve. I also seperately think that "bottleneck" is not generally a good term to apply to a complex project with high amounts of technical and philosophical uncertainty. The ability to see a "bottleneck" is very valuable should one exist, but I am skeptical of the ability to strongly predict where such bottlnecks will be in advance, and do not think the historical record really supports the ability to find such bottlenecks reliably by "thinking", as opposed to doing a lot of stuff, including trying things and seeing what works. If you have a broad distribution over where a bottleneck might be, then all activities lend value by "derisking" locations for particular bottlenecks if they succeed, and providing more evidence that a bottleneck is in a particular location if it fails. (kinda like: https://en.wikipedia.org/wiki/Swiss_cheese_model) For instance, I think of "deceptive alignment" as a possible way to get pessimal generalization, and thus a proabalistic "bottleneck" to various alignment approaches. But there are other ways things can fail, and so one can still lend value by solving non-deceptive-alignment related problems (although my day job consists of trying to get "benign generalization" our of ML, and thus does infact address that particular bottleneck imo). I also seperately think that if someone thinks they have identified a bottleneck, they should try to go resolve it as best they can. I think of that as what you (John) is doing, and fully support such activities, although think I am unlikely to join your particular project. I think
2johnswentworth
I do think that "we don't have enough information to know where the bottlenecks are yet" is in-general a reasonable counterargument to a "just focus on the bottlenecks" approach (insofar as we in fact do not yet have enough information). In this case I think we do have enough information, so that's perhaps a deeper crux.

You were prepared for gratitude, a commendation from the Admiral, your own department, parades in your name. You were also prepared to hear that your ‘list of helpful suggestions for ensuring supply ships survive random encounters’ was an impudent insult to the collective intellect of High Command, and receive a public execution for your trouble. What you weren’t prepared for was what happened: being allocated a modest stipend, assigned to a vessel, and told that if you’re so clever you should implement your plans personally.

You have 100gp to spend, and your options are as follows:

InterventionCost
Coating the underside of the ship in shark repellent would ensure that no journey would feature shark attacks; however, Vaarsuvius’ Law (“every trip between plot-relevant locations will have exactly one random encounter”) means
...

Vaarsuvius’ Law (“every trip between plot-relevant locations will have exactly one random encounter”)

 

I appreciate the Order of the Stick reference!

Once again, I’ve compiled some statistics on color trends in the spring/summer 2025 ready-to-wear fashion collections!

Background and Methodology

We just got done with “fashion month”, the flurry of activity in the fall when fashion designers release their spring/summer collections for the coming year and display them on runways in New York, London, Milan, and Paris.[1]

Vogue Magazine generously shares many images from the collections — by my count, 13,570 photographs in all, typically each of a different outfit.[2]

My question is a simple one: which colors are most common in the SS25[3] collections in aggregate? And how do they compare to previous years? Are there changing trends in color popularity?

There’s an obvious and boring answer: the most popular color in clothing is always black. Followed by white. But things get more...

Alicorn20

In the SS25 collections, we see it, unsurprisingly, in skin-baring clubwear:

but also in more classically graceful gowns and tailoring:

in slightly kooky and unhinged references to early-60s femininity:

and in more eclectic, playful styles:

Were there supposed to be images or links here?

Last week, ARC released a paper called Towards a Law of Iterated Expectations for Heuristic Estimators, which follows up on previous work on formalizing the presumption of independence. Most of the work described here was done in 2023.

A brief table of contents for this post:

 

In "Formalizing the Presumption of Independence", we defined a heuristic estimator to be a hypothetical algorithm that estimates the values of mathematical expression based on arguments. That is, a heuristic estimator is an algorithm  that takes as input

  • A formally specified real-valued expression ; and
  • A set of formal "arguments"  --

-- and outputs an...

2DanielFilan
Here and in the next dot point, should the inner heuristic estimate be conditioning on a larger set of arguments (perhaps chosen by an unknown method)? Otherwise it seems like you're just expressing some sort of self-knowledge.

Yeah, that's right -- see this section for the full statements.

Short Summary

LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible.

Longer summary

There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks:

  • Whether this limitation is illusory or actually exists.
  • If it exists, whether it will be solved by scaling or is a problem fundamental to LLMs.
  • If fundamental, whether it can be overcome by scaffolding & tooling.

If this is a real and fundamental limitation that can't be fully overcome by scaffolding, we should be skeptical of arguments like Leopold Aschenbrenner's (in his recent 'Situational Awareness') that we can just 'follow straight lines on graphs' and expect AGI...

I would definitely agree that if scale was the only thing needed, that could drastically shorten the timeline as compared to having to invent a completely new paradigm or AI, but even then that wouldn't necessarily make it fast. Pure scale could still be centuries, or even millennia away assuming it would even work.

We have enough scaling to see how that works (massively exponential resources for linear gains), and given that extreme errors in reasoning (that are obvious to both experts and laypeople alike) are only lightly abated during massive amounts of ... (read more)

As part of our Summer 2024 Program, MATS ran a series of discussion groups focused on questions and topics we believe are relevant to prioritizing research into AI safety. Each weekly session focused on one overarching question, and was accompanied by readings and suggested discussion questions. The purpose of running these discussions was to increase scholars’ knowledge about the AI safety ecosystem and models of how AI could cause a catastrophe, and hone scholars’ ability to think critically about threat models—ultimately, in service of helping scholars become excellent researchers.

The readings and questions were largely based on the curriculum from the Winter 2023-24 Program, with two changes:

  • We reduced the number of weeks, since in the previous cohort scholars found it harder to devote time to discussion groups later
...
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

I interact with journalists quite a lot and I have specific preferences. Not just for articles, but for behaviour. And journalists do behave pretty strangely at times. 

This account comes from talking to journalists on ~10 occasions. Including being quoted in ~5 articles. 

Privacy

I do not trust journalists to abide by norms of privacy. If I talk to a friend and without asking, share what they said, with their name attached, I expect they'd be upset. But journalists regularly act as if their profession sets up the opposite norm - that everything is publishable, unless explicitly agreed otherwise. This is bizarre to me. It's like they have taken a public oath to be untrustworthy.

Perhaps they would argue that it’s a few bad journalists who behave like this, but how...

I have done this twice. One journalist was happy to accept responsibility and I gave them a quote, another wasn't and I didn't.

This makes it sound like it's the decision of the journalist you are talking to whether or not they are responsible for their headlines. Some outlets have an editorial policy where the journalist has a say in the headline and other don't. Historically, the person setting the page was supposed to choose the headline as they know how much space there's for the headline on the page.

Wouldn't it be better to use a standard that's actually in control of the journalist you are speaking to when deciding whether to speak with them?

2Brendan Long
I largely agree with this article but I feel like it won't really change anyone's behavior. Journalists act the way they do because that's what they're rewarded for. And if your heuristic is that all journalists are untrustworthy, it makes it hard for trustworthy journalists to get any benefit from that. A more effective way to change behavior might be to make a public list of journalists who are or aren't trustworthy, with specific information about why ("In [insert URL here], Journalist A asked me for a quote and I said X, but they implied inaccurately that I believe Y" "In [insert URL here], Journalist B thought that I believe P but after I explained that I actually believe Q, they accurately reflected that in the article", or just boring ones like "I said X and they accurately quoted me as saying X", etc.).
3gb
I didn't downvote, but I would've hard disagreed on the "privacy" part if only there were a button for that. It's of course a different story if they're misquoting you, or taking quotes deliberately out of context to mislead. But to quote something you actually said but on second thought would prefer to keep out of publication is... really kind of what journalists need to do to keep people minimally well-informed. Your counterexamples involve communications with family and friends, and it's not very clear to me why the same heuristic should be automatically applied to conversations with strangers. But in any case, not even with the former your communication is "truly" private, as outside of very narrow exceptions like marital privilege, their testimony (on the record, for potentially thousands of people to read too) may be generally compelled under threat of arrest.

Janus, author of Simulators, has a blog with several posts about LLMs which aren't on LessWrong, but which have provided me lots of insight. This one, "Language models are multiverse generators," lays out an analogy (and playful metaphysical frame) in which language models are like laws driving the evolution of Everettian quantum multiverses. It also introduces the now-well-established Loom tool for exploring LLMs, which was apparently partly inspired by this analogy. I'm not sure why Janus didn't post this here; I think it deserves attention from LessWrong.

Other bangers from Janus' blog include Methods of prompt programming, HPMOR 32.5, Prophecies, and Surface Tension.

I think consequentialism describes only a subset of my wishes. For example, maximizing money is well modeled by it. But when I'm playing with something, it's mostly about the process, not the end result. Or when I want to respect the wishes of other people, I don't really know what end result I'm aiming for, but I can say what I'm willing or unwilling to do.

If I try to shoehorn everything into consequentialism, then I end up looking for "consequentialist permission" to do stuff. Like climbing a mountain: consequentialism says "I can put you on top of the mountain! Oh, that's not what you want? Then I can give you the feeling of having climbed it! You don't want that either? Then this is tricky..." This seems...

1deepthoughtlife
Broadly, consequentialism requires us to ignore many of the consequences of choosing consequentialism. And since that is what matters in consequentialism it is to that exact degree self-refuting. Other ethical systems like Deontology and Virtue Ethics are not self-refuting and thus should be preferred to the degree we can't prove similar fatal weaknesses. (Virtue Ethics is the most flexible system to consider, as you can simply include other systems as virtues! Considering the consequences is virtuous, just not the only virtue! Coming up with broadly applicable rules that you follow even when they aren't what you most prefer is a combination of honor and duty, both virtues.)
1Ben Livengood
I think consequentialism is the robust framework for achieving goals and I think my top goal is the flourishing of (most, the ones compatible with me) human values. That uses consequentialism as the ultimate lever to move the world but refers to consequences that are (almost) entirely the results of our biology-driven thinking and desiring and existing, at least for now.
2Dagon
Quite possibly, but without SOME framework of evaluating wishes, it's hard to know which wishes (even of oneself) to support and which to fight/deprioritize. Humans (or at least this one) often have desires or ideas that aren't, when considered, actually good ideas.  Also, humans (again, at least this one) have conflicting desires, only a subset of which CAN be pursued.   It's not perfect, and it doesn't work when extended too far into the tails (because nothing does), but consequentialism is one of the better options for judging one's desires and picking which to pursue.

This is tricky. In the post I mentioned "playing", where you do stuff without caring about any goal, and most play doesn't lead to anything interesting. But it's amazing how many of humanity's advances were made in this non-goal-directed, playing mode. This is mentioned for example in Feynman's book, the bit about the wobbling plate.